Risk - sensitive reinforcement learning algorithms with generalized average criterion 風險敏感度激勵學習的廣義平均算法
A reinforcement learning algorithm based on process reward and prioritized sweeping is presented as interference solving strategy 本文提出了基于過程獎賞和優(yōu)先掃除的強化學習算法作為多機器人系統(tǒng)的沖突消解策略。
( 4 ) a new cooperation model called macm is presentd and based on this model , an improved distributed reinforcement learning algorithm is also proposed ( 4 )提出一種新的多agent協(xié)作模型macm及一種改進的分布式強化學習算法。
In the first chapter of this paper , a comprehensive survey on the research of reinforcement learning algorithms , theory and applications is provided . the recent developments and future directions for mobile robot navigation are also discussed 本文的第一章對增強學習理論、算法和應用研究的發(fā)展情況進行了全面深入的綜述評論,同時分析了移動機器人導航控制的研究現(xiàn)狀和發(fā)展趨勢。
Reinforcement learning has been applied to single agent environment successfully . due to the theoretical limitation that it assumes that an environment is markovian , traditional reinforcement learning algorithms cannot be applied directly to multi - agent system 由于強化學習理論的限制,在多智能體系統(tǒng)中馬爾科夫過程模型不再適用,因此不能把強化學習直接用于多智能體的協(xié)作學習問題。